Analysis of Fingertips data

This file presents an analysis of hospital admissions data for districts with Clean Air Zones (CAZ) in England. The aim is to explore health trends, focusing on hospital admissions of under-19-year-olds, and to visualise geographical and temporal patterns using publicly available datasets.

Setup

Package Installation

Loading required packages for data analysis and visualisation.

Show the code
options(repos = c(CRAN = "https://cloud.r-project.org"))
if (!require("remotes")) {
  install.packages("remotes")
}
pkgs <- c(
  "sf",
  "tidyverse",
  "here",
  "tmap",
  "data.table",
  "plotly",
  "knitr",
  "kableExtra"
)

remotes::install_cran(pkgs)
sapply(pkgs, require, character.only = TRUE)
        sf  tidyverse       here       tmap data.table     plotly      knitr 
      TRUE       TRUE       TRUE       TRUE       TRUE       TRUE       TRUE 
kableExtra 
      TRUE 

Loading data

Hospital Admissions Data

We use hospital admissions data sourced from the Fingertips platform, which provides a wide range of public health indicators. The dataset includes information on admissions by area, age, sex, and time period, allowing for detailed analysis of health outcomes.

Show the code
admissions_data <- read_csv(
  "data_raw/hospital_admissions.csv",
  col_types = cols(
    IndicatorID = col_double(),
    IndicatorName = col_character(),
    ParentCode = col_character(),
    ParentName = col_character(),
    AreaCode = col_character(),
    AreaName = col_character(),
    AreaType = col_character(),
    Sex = col_character(),
    Age = col_character(),
    CategoryType = col_logical(),
    Category = col_logical(),
    Timeperiod = col_character(),
    Value = col_double(),
    LowerCI95.0limit = col_double(),
    UpperCI95.0limit = col_double(),
    LowerCI99.8limit = col_double(),
    UpperCI99.8limit = col_double(),
    Count = col_double(),
    Denominator = col_double(),
    Valuenote = col_character(),
    RecentTrend = col_character(),
    ComparedtoEnglandvalueorpercentiles = col_character(),
    ComparedtoParentvalueorpercentiles = col_character(),
    TimeperiodSortable = col_double(),
    Newdata = col_logical(),
    Comparedtogoal = col_logical(),
    Timeperiodrange = col_character()
  )
)

Geographical Data

CAZ Boundaries

Spatial boundaries for Clean Air Zones are loaded from a GeoPackage file. These boundaries define the areas where interventions to improve air quality have been implemented.

Show the code
caz_boundaries <- st_read(file.path(here(), "data_raw", "CAZ_boundaries.gpkg"))
Reading layer `CAZ_boundaries' from data source 
  `C:\temp_jf\CAZ-health-data-trends\data_raw\CAZ_boundaries.gpkg' 
  using driver `GPKG'
Simple feature collection with 7 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 356346.4 ymin: 99474.13 xmax: 465151.9 ymax: 565398
Projected CRS: OSGB36 / British National Grid

Local Authority District Boundaries

Local Authority District (LAD) boundaries are obtained from the ONS GeoPortal and downloaded if not already present. These boundaries are used to identify which districts intersect with CAZ areas.

Show the code
if (!file.exists("data_raw/LAD_MAY_2025_UK_BFC_V2_6634550694215771101.gpkg")) {
  url <- "https://github.com/itsleeds/CAZ-health-data-trends/releases/download/v0/LAD_MAY_2025_UK_BFC_V2_6634550694215771101.gpkg"

  download.file(
    url,
    destfile = "data_raw/LAD_MAY_2025_UK_BFC_V2_6634550694215771101.gpkg",
    mode = "wb"
  )
}

la_boundaries <- st_read(
  "data_raw/LAD_MAY_2025_UK_BFC_V2_6634550694215771101.gpkg",
  quiet = TRUE
)

Pre-processing

Identifying Districts with CAZ Coverage

To focus the analysis on relevant areas, we identify LADs that overlap with CAZ boundaries. This spatial join ensures that subsequent analyses are restricted to districts affected by CAZ interventions.

Show the code
CAZ_la_boundaries <- la_boundaries[caz_boundaries, ]

Visualising CAZ Locations

We create an interactive map to visualise the location of CAZ districts within England. This helps contextualise the analysis and provides a geographical overview of the study area.

Show the code
tmap_mode("view")

tm_shape(CAZ_la_boundaries) +
  tm_polygons("grey70") +
  tm_shape(caz_boundaries) +
  tm_polygons("blue", fill_alpha = 0.6)

Subsetting Admissions Data

The admissions data is filtered to include only records from districts with CAZ coverage. This step ensures that our health trend analysis is specific to the population impacted by CAZ policies.

Show the code
CAZ_admissions <- admissions_data |>
  filter(
    AreaCode %in% c(CAZ_la_boundaries$LAD25CD, "E08000019"),
    AreaCode != "E08000037"
  )